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CARLESON'S THEOREM: 
PROOF, COMPLEMENTS, VARIATIONS 

MICHAEL T. LACEY 
GEORGIA INSTITUTE OF TECHNOLOGY 



1. Introduction 



L. Carleson's celebrated theorem of 1965 [24] asserts the pointwise convergence of the 
partial Fourier sums of square integrable functions. We give a proof of this fact, in particular 
the proof of Lacey and Thiele [65], as it can be presented in brief self contained manner, and 
a number of related results can be seen by variants of the same argument. We survey some 
of these variants, complements to Carleson's theorem, as well as open problems. 1 

We are concerned with the Fourier transform on the real line, given by 

/(£) = / e-**/(a:) dx 



This work has been supported by an NSF grant. 

1 This paper is an extended version of the publication [54]. 
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for Schwartz functions /. For such functions, it is an important elementary fact that one has 
Fourier inversion, 

l r N - 

(1.1) /(x) = Jim - /(fle^de, xGR) 

N^oo ZTT J_ n 

the inversion holding for all Schwartz functions /. Indeed, 

-iY 



1 r v - 

r / /(0^ d£ = D N * f(x) 

!7r J-N 



2tt 

where D N (x) := smNx is the Dirchlet kernel. 

The convergence in (1.1) for Schwartz functions follows from the classical facts 

30 

D N (x) dx = 1, 

oo 

lim / Div(x) dx = 0, e > 0. 

\x\>e 



'N 

N— >oo 



L. Carleson's theorem asserts that (1.1) holds almost everywhere, for / G L 2 (R). The form 
of the Dirchlet kernel already points out the essential difficulties in establishing this theorem. 
That part of the kernel that is convolution with - corresponds to a singular integral. This 
can be done with the techniques associated to the Calderon Zygmund theory. In addition, 
one must establish some uniform control for the oscillatory term siniVa;, which falls outside 
of what is commonly considered to be part of the Calderon Zygmund theory. 

For technical reasons, we find it easier to consider the equivalent one sided inversion, 

1 



(1.2) f(x) = lim — / /(fle** dC 

Schwartz functions being dense in L 2 , one need only show that the set of functions for 
which a.e. convergence holds is closed. The standard method for doing so is to consider the 
maximal function below, which we refer to as the Carleson operator 

/N 
/iO'"' £ di , x G R. 
-oo 

There is a straight forward proposition. 

Proposition 1.4. Suppose that the Carleson operator satisfies 

(1.5) \{Cf(x) > A}| < \- 2 \\f\\l, f G L 2 (R), A > 0. 

Then, the set of functions f G L 2 (M) for which (1.2) holds is closed and hence all of L 2 (R). 
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Proof. For / G L 2 {R), we should see that 



limsup|/(x) - — / /(e)e^ d£\ = a.e 




To do so, we show that for all e > 0, \{Lf > e}| < e. We take g to be a smooth compactly 



This is a standard proposition, which holds in a general context, and serves as one of the 
prime motivations for considering maximal operators. Note in particular that we are not at 
this moment claiming that C is a bounded operator on L? . Inequality (1.5) is the so called 
weak L? bound, and we shall utilize the form of this bound in a very particular way in the 
proof below. 

It was one of L. Carleson's great achievements to invent a method to prove this weak type 
estimate. 

Carleson's Theorem. The estimate (1-5) holds. As a consequence, (1.2) holds for all 
f G L 2 (R) ; for almost every x G R. 

Carleson's original proof [24] was extended to L p , 1 < p < oo, by R. Hunt. Also see [88]. 
Charles Fefferman [38] gave an alternate proof that was influential by the explicit nature of 
it's "time frequency" analysis, of which we have more more to say below. We follow the proof 
of M. Lacey and C. Thiele [65]. More detailed comments on the history of the proof, and 
related results will come later. 

The proof will have three stages, the first being an appropriate decomposition of the Car- 
leson operator. The second being an introduction of three Lemmas, which can be efficiently 
combined to give the proof of our Theorem. The third being a proof of the Lemmas. 

We do not keep track of the value of generic absolute constants, instead using the notation 
A < B iff A < KB for some constant K. And A ~ B iff A < B and B < A. The notation 
1a denotes the indicator function of the set A. For an operator T, ||T|| P denotes the norm 
of T as an operator from L p to itself. 

Acknowledgment. These notes are based on a series of lectures given at the Erwin Schrodinger 
Institute, in Vienna Austria. I an indebted to the Institute for the opportunity to present 
these lectures. 
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The Fourier transform is a constant times a unitary operator on L 2 (IR). In particular, we 
shall take the Plancherel's identity for granted. 

Proposition 2.1. For all f,ge L 2 (M) ; 

(f,9) = c(f,g) 

for appropriate constant c = . 

The convolution of / and ip is given by f*ip(x) = f f{x — y)ij}{y) dy. We shall also assume 
the following Lemma. 

Lemma 2.2. If a bounded linear operator T on L 2 (R) commutes with translations, then 
Tf = ip * f , where ip is a distribution, which is to say a linear functional on Schwartz 
functions. In addition, the Fourier transform ofTf is given by 

Tf = $f. 

Let us introduce the operators associated to translation, modulation and dilation on the 
real line. 

(2.3) Tr y f(x):=f(x-y), 

(2.4) Modtf(x):=e* x f(x), 

(2.5) Dil* f( x ) : = A~ 1/J 7(:r/A), < p < oo, A > 0. 

Note that the dilation operator preserves LP norm. These operators are related through the 
Fourier transform, by 

(2.6) Tr y = Mod_ y , Mod^ = Tr ? , Dil 2 = Dil 2 /A 

And we should also observe that the Carleson operator commutes with translation and di- 
lation operators, while being invariant under modulation operators. For any y, £ e R, and 
A>0, 

Tr ?/ oC = CoTr i Dil 2 o C = C o Dil 2 , C o Mod^ = C. 
Thus, our mode of analysis should exhibit the same invariance properties. 

We have phrased the Carleson operator in terms of modulations of the operator P_ f(x) = 
f-oc /(0 e "^ d£i which is the Fourier projection on to negative frequencies. Specifically, since 
multiplication of / by an exponential is associated with a translation of /, we have 

(2.7) C7 = sup|P_(e lJV 7)l 

N 
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Figure 1 . Four different aspects ratios for tiles. Each fixed ratio gives rise to 
a tiling for the time frequency plane. 

A characterization of the operator P_ will be useful to us. 

Proposition 2.8. Up to a constant multiple, P_ is the unique bounded operator on L 2 (R) 
which (a) commutes with translation (b) commutes with dilations (c) has as it's kernel pre- 
cisely those functions with frequency support on the positive axis. 

Proof. Let T be a bounded operator on L 2 (K) which satisfies these three properties. Condi- 
tion (a) implies that T is given by convolution with respect to a distribution. Such operators 

are equivalently characterized in frequency variables by T f = rf for some bounded function 
r. Condition (b) then implies that r(£) = t(£/|£|) for all £ 7^ 0. A function / is in the kernel 
of T iff / is supported on the zero set of r. Thus (c) implies that r is identically on the 
positive real axis, and non-zero on the negative axis. Thus, T must be a multiple of P_. □ 

We move towards the tool that will permit us to decompose Carleson operator, and take 
advantage of some combinatorics of the time-frequency plane. We let Dbea choice of dyadic 
grids on the line. Of the different choices we can make, we take the grid to be one that is 
preserved under dilations by powers of 2. That is 

(2.9) V = {[j2 k ,(j + l)2 k ) : j,keZ} 

Thus, for each interval I &T> and fceZ, the interval 2 k I = {2 k x : x G 1} is also in T>. 

A tile is a rectangle s ED xT> that has area one. We write a tile as s = I x u, thinking of 
the first interval as a time interval and the second as frequency. The requirement of having 
area one is suggested by the uncertainty principle of the Fourier transform, or alternatively, 
our calculation of the Fourier transform of the dilation operators in (2.6) . Let T denote 
the set of all tiles. While tiles all have area one, the ratio between the time and frequency 
coordinates is permitted to be arbitrary. See Figure 2 for a few possible choices of this ratio. 
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Figure 2. Some of the tiles that contribute to the sum for Q^. The shaded 
areas are the tiles I s x u s+ . 

Each dyadic interval is a union of its left and right halves, which are also dyadic. For an 
interval u we denote these as u>- and u + respectively. We are in the habit of associating 
frequency intervals with the vertical axis. So u;_ will lie below uj + . Associate to a tile 
s — I s x u s the rectangles s± = I s x u a ±. These two rectangles play complementary roles in 
our proof. 

Fix a Schwartz function with l[_i/ 9) i/g] < <p < l[_jy 8)1 / 8 ]. Define a function associated 
to a tile s by 

(2.10) = Mod cK „) Tr c(7s) Dilf 7s <p 

where c( J) is the center of the interval J. Notice that tp s has Fourier transform supported on 
uj s -, and is highly localized in time variables around the interval I s . That is, ip s is essentially 
supported in the time-frequency plane on the rectangle I s x uo s _. Notice that the set of 
functions {(f s : s G T} has a set of invariances with respect to translation, modulation, and 
dilation that mimics those of the Carleson operator. 

It is our purpose to devise a decomposition of the projection P_ in terms of the tiles just 
introduced. To this end, for a choice of £ G M, let 

(2.H) Qe/ = X) WW'^- 

seT 

We should consider general values of £ for the reason that the dyadic grid distinguishes certain 
points as being interior, or a boundary point, to an infinite chain of dyadic intervals. And 
moreover, for a given £, only certain tiles can contribute to the sum above, those tiles being 
determined by the expansion of £ in a binary digits. See Figure 2. Let us list some relevant 
properties of these operators. 

Proposition 2.12. For any £, the operator Qg is a bounded operator on L? , with bound 
independent o/£. Its kernel contains those functions with Fourier transform supported on 
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[£, oo), and it is positive semidefinite. Moreover, for each integer k, 

(2.13) Q ? = Dil 2 _ fc Q ?2 _ fc Dil 2 fc 

(2.14) Q c , fc Tr 2fc =Tr 2fc g 5ifc , 

where k = set l Wi+ (£)(/, 

|/ s |<2 fc 

Proof. Let {c^(n) : n G Z} be the set of dyadic intervals for which £ e u;(n)+, listed in 
increasing order, thus • • • C u;(ra) C cj(n + 1) C • ■ • . Let T(n) = {s e T : w s = ^(^)}, and 

Q(n)/= E (f,<Ps)<Ps- 
seT(n) 

The intervals cj(n)_ are disjoint in n, and since y2 s has frequency support in u s _, it follows 
that the operators Q( n ) are orthogonal in n. The boundedness of Q ? reduces therefore to the 
uniform boundedness of in n. 

Two operators Q,-^ and Q( n /) differ by composition with a modulation operator and a 
dilation operator that preserves L? norms. Thus, it suffices to consider the L? norm bound of 
a Q( n ) with | is | = 1 for all s e Tin). Using the fact that ip s is a rapidly decreasing function, 
we see that 

(2.15) \(<Ps,<Ps>)\ < dist(I s , I s ,)-\ 

Now that the spatial length of the tiles is one, the tiles are separated by integral distances. 
Since Yln n ~ A < 00 > 

HQ(«) /III = E E (f^s)(<Ps,<Ps')(<Ps',f) 
seT(n) s'eT(n) 

< SUp ^ \(f,(Ps)(f,(P(I s +n)xu(n))\ 
n& seT(n) 

< E k/^.>i 2 - 

seT(n) 

The last inequality following by Cauchy-Schwarz. The last sum is easily controlled, by simply 
bringing in the absolute values. Since \(f,(p s )\ 2 < J|/| 2 |y s | dx 



E \(f^ s )\ 2 < [\f\ 2 E m** 

sST(n) J seT(n) 

< 11/H2SUP E \v*( x ) 



sST(n) 

This completes the proof of the uniform boundedness of Q £ . 



<\\f\\l 
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Since all of the functions <p a that contribute to the definition of have frequency support 
below £, the conclusion abut the kernel of the operator is obvious. And that it is positive 
semidefinite, observe that 

(2.16) (Qe/,/>= £ I </,*>.) I 2 >o. 

seT 

In particular, (Q* (p s , ip s ) ^ for s G T(n). 

To see (2.13) recall (2.6) and our specific choice of grids, (2.9) . To see (2.14) , observe 
that if I G T> has length at most 2 fc , then / + 2 k is also in T>. 

□ 

As the lemma makes clear, Mod_£ Mod^ serves as an approximation to P_. A limiting 
procedure will recover P_ exactly. Consider 

(2.17) Q= lim / Dil^ A Tr_ 2/ Mod_ ? Q £ Mod ? Tr 2/ Dil^//(dA,dy,dO- 

v - x .//;•; v. 

Here, B{Y) is the set [1,2] x [0,Y] x [0,F], and /x is normalized Lebesgue measure. Notice 
that the dilations are given in terms of 2 A , so that in that parameter, we are performing an 
average with respect to the multiplicative Haar measure on IR + . 

Apply the right hand side to a Schwartz function /. It is easy to see that as k — > — oo, the 
terms 

Mod^ Tr^ Dil^A Q c , fc / 

tend to zero uniformly in the parameters £, y, and A, with a rate that depends upon /. Here, 
Q^ k is as in (2.14) . Similarly, as k — > oo, the terms 

Mod^ Tr y DilgA (Q - Q^k)f 

also tend to zero uniformly. Hence, the limit is seen to exist for all Schwartz functions. By 
Proposition 2.12, it follows that Q is a bounded operator on L 2 . That Q is translation and 
dilation invariant follows from (2.14) and (2.13) . Its kernel contains those functions with 
Fourier transform supported on [0, oo). Finally, if we verify that Q is not identically zero, we 
can conclude that it is a multiple of P_. But, for e.g. / = Mod_i/ 8 y?, it is easy to see that 

(Q s Mod 5 Tr^ Dil^ A /, Mod e Ti y Dil^ /) > 

so that Qf 7^ 0. Thus, Q is a multiple of P_. 

We can return to the Carleson operator. An important viewpoint emphasized by C. Feffer- 
man's proof [38] is that we should linearize the supremum. That is we consider a measurable 
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map N : R i— *■ R, which specifies the value of iV at which the supremum in (1.3) occurs. 
Then, it suffices to bound the operator norm of the linear (not sublinear) operator 

P Mod N{x) 

Considering (2.17) , we set 

C N f(x) = J2^s + (N(x))(f,ip s )cp s (x). 
seT 

Our main Lemma is then 

Lemma 2.18. There is an absolute constant K so that for all measurable functions N : 
R i— > R, we have the weak type inequality 

(2.19) \{C N f > A}| < \- 2 \\f\& A > 0, / G L 2 (R). 

By the convexity of the weak L 2 norm, this theorem immediately implies the same estimate 
for P_ Mo(1n(x), and so proves Theorem 1. The proof of the lemma is obtained by combining 
the three estimates detailed in the next section. 

2.1. Complements. At the conclusion to the different sections of the proof, some comple- 
ments to the ideas and techniques of the previous sections will be mentioned, but not proved. 
These items can be considered as exercises. 

Remark 2.20. For Schwartz functions ip and ip, set 

A/:=^(/,Tr n ^)Tr n V> 

Bf:=f Tr_ v ATrj, dy 
Jo 

Then, B is a convolution operator, that is B / = ^ * f for some function ty, which can be 
explicitly computed. 

Remark 2.21. The identity operator is, up to a constant multiple, the unique bounded op- 
erator A on L 2 which commutes with all translation and modulation operators. That is 
A : L 2 i-f L 2 , and for all y G R and £ G R, 

Tr y A = A Tr y , Mod c A = A Mod c . 
Remark 2.22. The operators 

A,- / := (f 

seT 

|/.|=2* 

are uniformly bounded operators on L 2 (R), assuming that ip is a Schwartz function. 
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Remark 2.23. Assuming that tp ^ 0, the operator below is a non-zero multiple of the identity 
operator on L 2 . 

I j Tr_ y Mod_ 5 Aj Mod 5 Tr y d£dy. 
Jo Jo 

3. The Central Lemmas 

Observe that the weak type estimate of Lemma 2.18 is implied by 
(3-1) \(C N f,l E )\< ll/l| 2 |£| 1/2 , 

for all functions / and sets E of finite measure. (In fact this inequality is equivalent to the 
weak L 2 bound.) 

By linearity of Cn, we may assume that ||/||2 = 1- By the invariance of Cn under dilations 
by a factor of 2 (with a change of measurable N(x)), we can assume that 1/2 < \E\ < 1. Set 

(3.2) S = (l Ws+ o AOyv 

We shall show that 

(3-3) ^|(/,^)(0 S ,1 E )|<1. 

seT 

To help keep the notation straight, note that <p s is a smooth function, adapted to the tile. 
On the other hand, <j) s is the rough function paired with the indicator set l Wa+ N. From this 
point forward, the function / and the set E are fixed. We use data about these two objects 
to organize our proof. 

As the sum above is over strictly positive quantities, we may consider all sums to be taken 
over some finite subset of tiles. Thus, there is never any question that the sums we treat are 
finite, and the iterative procedures we describe will all terminate. The estimates we obtain 
will be independent of the fact that the sum is formally over a set of finite tiles. 

We need some concepts to phrase the proof. There is a natural partial order on tiles. Say 
that s < s' iff uj s Z> uj s i and I s C I s t. Note that the time variable of s is localized to that of 
s', and the frequency variable of s is similarly localized, up to the variability allowed by the 
uncertainty principle. Note that two tiles are incomparable with respect to the '<' partial 
order iff the tiles, as rectangles in the time frequency plane, do not intersect. A "maximal 
tile" will one that is maximal with respect to this partial order. See figure 1. 

We call a set of tiles Tc5a tree if there is a tile It x wt, called the top of the tree, such 
that for all s G T, s < It X ujt- We note that the top is not uniquely defined. An important 
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point is that a tree top specifies a location in time variable for the tiles in the tree, namely 
inside It, and localizes the frequency variables, identifying u; T as a nominal origin. 

We say that S has count at most A, and write 

count (S) < A 

iff S is a union Utgt where each T G T is a tree, and 

^|/ T | < A. 

TeT 

Fix x( x ) = (1 + where k is a large constant, whose exact value is unimportant to 

us. Define 

(3.4) X/:=Tr c(7) Dilf /|X , 



(3.5) dense(s) := sup / Xi s i dx, 

s<s' JN- 1 ^,) 

dense (iS) := sup dense (s), S C T. 
ses 

The first and most natural definition of a "density" of a tile, would be \I s \~ 1 \N~ 1 (u s +) PI I 3 \. 
But (p is supported on the whole real line, though does decay faster than any inverse of a 
polynomial. We refer to this as a "Schwartz tails problem." The definition of density as 
f N -i/ u \ Xis dx, as it turns out, is still not adequate. That we should take the supremum over 
s < s' only becomes evident in the proof of the "Tree Lemma" below. 

The "Density Lemma" is 
Lemma 3.6. Any subset S C T is a union o/5hcav y and 5n g ht for which 

dense (iSiight) < | dense («S), 

and the collection iSheavy satisfies 

(3.7) count (5 h eavy )< dense (5) 

What is significant is that this relatively simple lemma admits a non-trivial variant inti- 
mately linked to the tree structure and orthogonality. We should refine the notion of a tree. 
Call a tree T with top It X uj^ a ±tree iff for each s G T, aside from the top, It X f]I s x u) 8 ± 
is not empty. Any tree is a union of a +tree and a —tree. If T is a +tree, observe that the 
rectangles {I s x u s _ : s G T} are disjoint. And, by the proof of Proposition 2.12, we see 
that 



A(T) 2 :=^|(./> s }| 2 < 



seT 
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This motivates the definition 

(3.8) size(S) := sup{|/ T r 1/2 A(T) : T C S, T is a +tree}. 

The "Size Lemma" is 

Lemma 3.9. Any subset S C T is a union of Su g and «S sma u for which 

size(<S sma ii) < \ size(S), 

and the collection Sug satisfies 

(3.10) count(iSbig) < size(5) 2 . 

Our final Lemma relates trees, density and size. It is the "Tree Lemma." 
Lemma 3.11. For any tree T 

(3.12) ^|<j> s >(0 s ,l B }| < |/ T |size(T)dense(T). 

seT 

The final elements of the proof are organized as follows. Certainly, dense (T) < 2 for 
k sufficiently large. We take some finite subset S of T, and so certainly size (S) < oo. 
If size(iS) < 2, we jump to the next stage of the proof. Otherwise, we iteratively apply 
Lemma 3.9 to obtain subcollections S n C S, n > 0, for which 

(3.13) size(S n ) < 2 n , n > 0, 
and S n satisfies 

(3.14) count (S n ) < 2~ 2n . 

We are left with a collection of tiles S' = S — Un>o^™ which has both density and size at 
most 2. 

Now, both Lemma 3.6 and Lemma 3.9 are set up for iterative application. And we should 
apply them so that the estimates in (3.7) and (3.10) are of the same order. (This means that 
we should have density about the square of the size.) As a consequence, we can achieve a 
decomposition of S into collections S n , n G Z, which satisfy (3.13) , (3.14) and 

(3.15) dense(5 n ) < min(2, 2 2n ). 

Use the estimates (3.12) — (3.15) . Write S n as a union of trees T e T n , this collection of 
trees satisfying the estimate of (3.14) . We see that 

Y,\(f,vs)(<p s ,iE)\= Yl Dtf'V-W" 1 *)! 

ses n tgt„ s6T 

(3.16) <2"min(2,2 2 ") ^ |/ T | 

TeT„ 
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< min(2- n ,2 n ). 

This is summable over n G Z to an absolute constant, and so our proof (3.3) is complete, 
aside from the proofs of the three key lemmas. 

3.1. Complements. 

Remark 3.17. These two conditions are equivalent. 

supA- 2 |{/>A}|<l, 

A>0 

/ |/| dx < \E\ 1/2 , \E\<oo. 
Je 

Remark 3.18. Let A be an operator for which Dil^ A = ADil 2 ^ for all k G TL. Suppose that 
there is an absolute constant K so that for all functions / G L 2 (R) of norm one, 

|{A/>1}|<K 

Then for all A > 0, 

|{A/>A}|<A- 2 ||/|| 2 . 

See [32]. 

Remark 3.19. For any +tree T, 

£it/> s >i 2 < [\f\ 2 Tr c{lT) m? T xdx. 

Moreover, one has the inequality 

I/tI-^K/,^)! 2 ^ mfM|/| 2 (x). 

Here, M is the maximal function, 

M/(x) = sup(2t)- 1 [ \f(x-y)\dy. 

t>o J-t 

4. The Density Lemma 
Set S = dense(5). Suppose for the moment that density had the simpler definition 

\N-\cu s )ni s \ 

dense(s) := — . 

The collection <S>h ea vy ^ s *° ^ e a umon of trees. So to select this collection, it suffices to select 
the tops of the trees in this set. 
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Select the tops of the trees, Tops as being those tiles s with dense(s) exceeding 5/2, which 
are also maximal with respect to the partial order '<.' The tree associated to such a tile 
s G Tops would just be all those tiles in S which are less than s. The tiles in Tops are pairwise 
incomparable with respect to the partial order '<,' and so are pairwise disjoint rectangles in 
the time-frequency plane. And so the sets N~ 1 (u s ) C\I S C E are pairwise disjoint, and each 
has measure at least | |ig|. Hence the estimate below is immediate. 

(4-1) El J ^ rl 

sGTops 

The Schwartz tails problem prevents us from using this very simple estimate to prove this 
lemma, but in the present context, the Schwarz tails are a weak enemy at best. Let Tops be 
those s 6 5 which have dense (s) > 5/2 and are maximal with respect to '<.' It suffices to 
show (4.1) . For an integer k > 0, and small constant c, let Sk be those s G Tops for which 

(4.2) \2 k I s n N-\uj s )\ > c2 2k 5\I s \. 

Every tile in Tops will be in some Sk, with c sufficiently small, and so it suffices to show that 



(4.3) ^|/ s |<2- fc r 1 . 



Fix k. Select from Sk a subset S' k of tiles satisfying {2 k I s x oj s : s G S' k } are pairwise 
disjoint, and if s G Sk and s' G S' k are tiles such that 2 k I s x uj s and 2 k I s t x uj s i intersect, 
then \I S \ < \I S >\- It is clearly possible to select such a subset. And since the tiles in Sk are 
incomparable with respect to '<', we can use (4.2) to estimate 

£W<2*+'£|/,| 

< ^2- k 5-\ 

— c 

That is, we see that (4.3) holds, completing our proof. 
4.1. Complements. 

Remark 4.4. Let S be a set of tiles for which there is a constant K so that for all dyadic 
intervals J, 



IsCJ 



Then for all 1 < p < oo, and intervals J, 

II Ei- 



ses 



< K I 7l 1/p 
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In fact, K p < p. 

5. The Size Lemma 

Set a = size(«S). We will need to construct a collection of trees T G Ti arge , with S\ aTgc 
I Jrp c rf, T, and 

W i fc i large ' 



(5-1) £ |/t|<<t- 



TeT 



large 



as required by (3.10) . 



The selection of trees T G Ti arge will be done in conjunction with the construction of +trees 
T + G Ti arge+ . This collection will play a critical role in the verification of (5.1) . 

The construction is recursive in nature. Initialize 

^stock : — ,5 j Ti arge := 0, Ti arge+ := 0. 
While size(<S stock ) > a/2, select a +tree T+ C <S stock with 
(5.2) A(T+) > ||/ T+ |. 

In addition, the top of the tree Jt + x ujt + should be maximal with respect to the partial 
order '<' among all trees that satisfy (5.2) . And c(u;t + ) should be minimal, in the order 
of M. Then, take T to be the maximal tree (without reference to sign) in S stock with top 
I T+ x u T+ . 

After this tree is chosen, update 

^stock ^stock rj-i 

Ti arge • Tj arge U T, Ti arge _(_ . Ti argc _|_ U T_|_. 
Once the while loop finishes, set 5 sma ii := S stock and the recursive procedure stops. 

It remains to verify (5.1) . This is a orthogonality statement, but one that is just weaker 
than true orthogonality. Note that a particular enemy is the is the situation in which 
(ip s ,(p s i) 7^ 0. When u s = u s >, this may happen, but as we saw in the proof of Proposi- 
tion 2.12, this case may be handled by direct methods. Thus we are primarily concerned 
with the case that e.g. cu s /_. 

A central part of this argument is a bit of geometry of the time-frequency plane that 
is encoded in the construction of the +trees above. Suppose there are two trees T ^ T' e 
Ti arge +, and tiles s G T and s' G T' such that uj s _ oj 8 '-i then, it is the case that Is'HIt = 0- 
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It x ujt 



L" 



Figure 3. The proof of strongly disjoint trees. Note that the gray tile could 
be in the tree that was removed after the selection of the H — tree with the top 
indicated above. 



We refer to this property as 'strong disjointness.' It is a condition that is strictly stronger 
than just requiring that the sets in the time-frequency plane below are disjoint in T. 



U 



I. X u s 



T G T 



largc+ 



To see that strong disjointness holds, observe that ojt C uj s uj s '-- Thus wt' lies above 
c^t- That is, in our recursive procedure, T was constructed first. If it were the case that 
I s ' n It 7^ 0, observe that one interval would have to be contained in the other. But tiles 
have area one, thus, it must be the case that I s > C It- That means that s' would have been 
in the tree (the one without sign) that was removed from S stock before T' was constructed. 
This is a contradiction which proves strong disjointness. See Figure 5. 

We use this strong disjointness condition, and the selection criteria (5.2) , to prove the 
bound (5.1) . The method of proof is closely related to the so called TT* method. Set 



<s' = U 



T£Ti ar g e _|_ 



T, and 



ses' 



The operator / i— »■ (f,ip s )cp s is self-adjoint, so that 



a' 



TeT 



largc+ 



< 
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And so, we should show that 

(5-3) \\F\\ 2 2 <a 2 J2 I'tI- 

This will complete the proof. 

This last inequality is seen by expanding the square on the left hand side. In particular, 
the left hand side of (5.3) is at most the sum of the two terms 

s,s'es' 

IaJ s =UJ s i 

(5-5) 2 Y \(f,<Ps)(<Ps, l Ps>}( l Ps>,f}\ 

s,s'es' 

For the term (5.4) , we have the obvious estimate on the inner product 

/ dist(/ s ,/ s Q \-4 

\{^s^s')\<[l + jj| J • 

(Compare to (2.15) .) Thus, by Cauchy-Schwarz, 

(5-4) < ^|(/,^)| 2 <a 2 \^ 

sGS' TpTi i 

J- v J- largc+ 

For the term (5.5) , we need only show that for each tree T, 

(5-6) E \(f><P.)(<P.,<P*)(<P*,f)\ 2 So'W- 

s&T s'es' 

Here, S(s) := {s' G S' — T : cj s _ uv_}. The implied constant should be independent of 
the tree T. 

Now, the strong disjointness condition enters in two ways. For s G T, and s' G S(s), it is 
the case that I s iC\It = 0. But furthermore, for s', s" G S(s), we have e.g. cj s _ C uv- C u} s "-, 
so that I s > fl I s " is also empty. 

At this point, rather clumsy estimates of (5.6) are in fact optimal. The definition of size 
gives us the bound 

\&s>j)\<V\i7W 

And, since uj s C uj s >, we have \I S \ > \I S '\, and I s , and I s > are, in the typical situation, far 
apart. An estimation left to the reader gives 



(5-7) \{¥>s,<Ps')\<V\Is'\\Is\xiMIs')) 
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Thus, we bound the left side of (5.6) by 

Y Y \(f>V»)(V'>V*)(w>f)\~ a *'52\ I '>'\\ I >\Xi.(c(Ii')) 
seT s'eS(s) seT 
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<a 2 



seT 
^ <* 2 \It\ 



Y / i J si^( x ) dx 



(5.8) 

as is easy to verify. This completes the proof of (5.6) , and so finishes the proof of Lemma 3.9. 



5.1. Complements. 

Remark 5.9. Concerning the inequality (5.8) , for any tree T, we have 

Y / \ Is \Xis(x) dx < \I T \. 



seT " j t 

Remark 5.10. Let T be a +tree and set 



F T = Y(f^s) 



5GT 



1/2 



Then, the inequality below is true. 

< size(T)|/ T | 1 / 2 . 

Remark 5.11. With the notation above, assume that G u> T . Then, 

121 1/2 

'IKI 7 ,IWiW 

J 



size(T)~ sup[|J|- 1 ^|(/,^}| 2 ] : 



seT 

IsCJ 







Ft — |^| 1 / Ft 


2 


1/2 


~ sup 


dx 


J 




J j 







where the supremum is over all intervals J. The last quantity is the BMO norm of Ft- 

Remark 5.12. It is an important heuristic that for a collection S of pairwise incomparable 
tiles, the functions {</? s : s 6 5} are nearly orthogonal. The heuristic permits a quantification 
in terms of the following weak type inequality. Let S be a collection of tiles that are pairwise 
incomparable with respect to '<.' Then for all /eL 2 and all A > 0, 



£|/,I<a- 

ses x 



where S\ = {s 6 S : \{f, (p s )\ > Ay|I s |}. Note that this in an inequality about the 
boundedness of a sublinear operator from L 2 (IR) to L 2,00 (M x S). In the latter space, one 
uses counting measure on S. 
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Remark 5.13. Another important heuristic is that the notion of "strong disjointness" for trees 
is as "pairwise incomparable" is for tiles. Let T be a collection of strongly disjoint trees. 
Show that for all / G L 2 and all A > 0, 



I>t|<a- 



2 i I t \\2 

2i 



TGT A 

where T A = {T G T : A(T) > A|/ T |}. 

Remark 5.14. Let S be a collection of tiles that are pairwise incomparable with respect to 
'<.' Show that for all 2 < p < oo, 



i/p 

~ \\j up- 



s€<S 

Notice that the form of this estimate at p = oo is obvious. 

Remark 5.15. Let T be a collection of strongly disjoint trees. Then for all 2 < p < oo, 

TeT 

Remark 5.16. The LP estimates of the previous two complements can in some instances be 
improved. For each integer k, 



[E MAfii < I,/,,, 2< P <oo. 



|/ s |=2 fc 

This can be seen by showing that 

i (/.^) I 



sGT 
|/ s |=2 fe 



6. The Tree Lemma 

We begin with some remarks about the maximal function, and a particular form of the 
same that we shall use at a critical point of this proof. Consider the maximal function 

M/ = supl 7 |(/,x/>|. 
lev 



It is well known that this is bounded on I 2 . A proof follows. Consider a linearized version 
of the supremum. To each I e V, associate a set E(I) C /, and require that the sets 
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{E(I) : I E V} be pairwise disjoint. (Thus, for fixed /, E(I) is that subset of / on which 
the supremum above is equal to \ (f,Xi)\-) Define 

A/ = </>*/> 
lev 

We show that ||A|| 2 is bounded by a constant, independent of the choice of the sets E(I). 
The method is that of TT*. Note that for positive / 

AA*f<2j2Yl 1bw(X/,Xj>(1b(J),/> 
iev\j\<\i\ 

~ ^2 1 E(I)(f,Xl) 

lev 

It follows that 

||A*||* = sup (A*/, A*f) 

ll/l|2=l 

= sup (/, AA* /) 

11/112=1 



< 



l|A|b 



and so ||A|| 2 < 1, as claimed. 



We shall have recourse to not only this, but a particular refinement. Let J be a partition 
of R into dyadic intervals. To each J & J, associate a subset G(J) C J, with < S\J\, 

where < 5 < 1 is fixed. Consider 



(6.1) M s f:= Vl G(J) sup|(/, X 7)| 

Then ||M,5|| 2 < v^5- The proof is 

\M 5 f\ 2 dx = ^|G(J)|sup|(/, X /) 

5 V|J|sup|(/, 
.7^ /3J 



< 



< 5 / |M/| 2 



< 



22 



M.T. LACEY 



We begin the main line of the argument. Let 5 = dense(T), and a = size(T). Make a 
choice of signs e s G {±1} such that 



^2\(f><Ps)(<l>8,lE)\= / ^2e s (f,(p s )(p s dx. 



By the "Schwartz tails," the integral above is supported on the whole real line. Let J be 
a partition of K consisting of the maximal dyadic intervals J such that 3 J does not contain 
any I s for s G T. It is helpful to observe that for such J if \ J\ < \It\, then J C 3/t- And if 
1^1 > |-^t|; then dist(J, It) ^ \J\- The integral above is at most the sum over J G J of the 
two terms below. 

(6-2) Y,\{f,Vs)\l \<Ps\dx 

„^r. JJDE 



\i.\<\J\ 



(6.3) / I £ s{f,<Ps)<t>, 



\Is\>\J\ 



dx 



Notice that for the second sum to be non-zero, we must have J C 3 It- 

The first term (6.2) is controlled by an appeal to the "Schwartz tails." Fix an integer 
n > 0, and only consider those s G T for which \I S \ = 2 _n | J\. Now, the distance of I s to J 
is at least > |J|. An d, 

!</,¥>,) I / i^idx^^d/^^dist^j))- 10 !/.!. 

JjnE 

The J s C It, so that summing this over \I S \ = 2~ n \J\ will give us 

a52~ n min ( | J\ , | J T | (dist ( J, J T ) | J T | _1 ) ~ 10 ) . 
This is summed over n > and J E J to bound (6.2) by < <7(5|/t|, as required. 



Critical to the control of (6.3) is the following observation. Let 

(6.4) G(J) = JH |J N-\uj s+ ). 

seT 

\I.\>\J\ 

Then |G(J)| < S\J\. To see this, let J' be the next larger dyadic interval that contains J. 
Then 3J' must contain some I s r, for s' G T. Let s" be that tile with I s > C J s », |/ s »| = |J|, 
and cu T C uj s ii. Then, s' < s", and by the definition of density, 

/ Xi s „ dx < 5 

J EnN- 1 ^^,) 

But, for each s as in (6.4) , we have u s C u) s », so that G(J) C iV _1 (uv/). Our claim follows. 
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Suppose that T is a —tree. That means that the tiles {I s x u s+ : s G T} are disjoint. We 
use an estimation absent of any cancellation effects. Then, the bound for (6.3) is no more 
than 



\G(J)\\\ £ K/,^)0 S 



<8a\J\. 



\h\>\J\ 



This is summed over J C 3/ T to get the desired bound. 

Suppose that T is a +tree. (This is the interesting case.) Then, the tiles { I s x u s __ : s G T} 
are pairwise disjoint, and we set 

F = Mod_ c(wx) ^ £ s(f, Vs)y s 
seT 

Here it is useful to us that we only use the "smooth" functions ip s in the definition of this 
function. Note that \\F\\2 < CTyJTxj, which is a consequence of the definition of size and 
Proposition 2.12. Set t(x) = sup{|/ s | : s G T, N(x) G uj s+ }, and observe that for each J, 
and x G J, 



\I S \>\J\ r(as)>|/ 8 |>|J| 

This is so since all of the intervals uj s+ must contain w T , and if N(x) G u s+ , then it must also 
be in every other u s r + that is larger. What is significant here is that on the right we have a 
truncation of the sum that defines F. 

This last sum can be dominated by a maximal function. For any r > and J G <J, let 

F Tj = Mod_ c(a;T) Y £ s(f, ( Ps)(fs 

seT 

r>\i.\>\J\ 

This function has Fourier support in the interval [— ||J| _1 , — In particular, recalling 

how we defined ip, we can choose i < a, b < | so that 

^^(Dil^^-Dit^)*^ 

We conclude that for x G J, 

\F T(x) ,j{x)\ < M s F(x), 

where is defined as in (6.1) . 

The conclusion of this proof is now at hand. We have 

V / \F T[x)>J \ dx < [ M 5 Fdx 
Je j J G(J) ^U|j|< 3 |/ T |G(J) 

\J\<3absI T 
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<| |J G(J)| 1/2 ||M,F|| 2 

|J|<3|/t] 

<5v^||F|| 2 
< v8\It\ 



6.1. Complements. 

Remark 6.5. The estimate below is somewhat cruder than the one just obtained, and therefore 
easier to obtain. For all trees T, 



set 



1/2 



seT 



< size(T)|/ T | 1 / 2 . 

Remark 6.6. The maximal function Ma in (6.1) admits the bounds 

||M 5 || P <^, Kp<oo. 

This depends upon the fact that the maximal function itself maps LP into itself, for 1 < p < 
oo. 



Remark 6.7. For any tree T, 

||1>'^« <5 1/P \\9\\ 



pi 



1 < p < oo. 



Remark 6.8. For a +tree T, 



|£</,v-> 



< 



1/2 



1 < p < OO. 



Conclude that 



J2(f^s)fs < size(T)|/ T r /p , Kp<oo. 



7. Carleson's Theorem on L p , 1 < p ^ 2 < oo 



We outline a proof that the Carleson maximal operator maps L p into itself for all 1 < p < 
oo. The key point is that we should obtain a distributional estimate for the model operator. 

Proposition 7.1. For 1 < p < oo, there is an absolute constant K p so that for all sets 
£cl of finite measure and measurable functions N, we have 



(7.2) 



\{\C N 1 E \ > \}\ < K p p \~ p \E\. 
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Interpolation provides the L p inequalities. We shall in fact prove that for all sets E and 
F, there is a set F' C F of measure > \\F\, 

(7-3) \(Cl El l F ,)\< min(|£|,|F|)(l + |logjfj|) 



It is a routine matter to see that this estimate implies that 
(7-4) \{\C1 E \>X}\<\E 



A | log A | if < A < 1/2 
e~ cX otherwise 



Here, c is an absolute constant. This distributional inequality is in fact the best that is 
known about the Carleson operator. See Sjolin [88] for the Walsh case, and [89] for the 
Fourier case. A more recent publication proving the same point is Arias de Reyna [6]. Both 
authors present a proof along the lines of Carleson. We follow the weak type inequality 
approach of Muscalu, Tao, and Thiele [78]. The relevance of this approach to the Carleson 
theorem was demonstrated by Grafakos, Tao, and Terwilleger [42]. 

We shall find it necessary to appeal to some deeper properties of the Calderon Zygmund 
theory, and in particular a weak L 1 bound for the maximal function, but also the bound in 
(7.11) below. 

In proving (7.3) , we can rely upon invariance under dilations, up to a change in the 
measurable N(x), to assume that 1/2 < \E\ < 1. As we already know the weak L 2 estimate, 
(7.3) is obvious for ^ < \F\ < 3. The argument then splits into two cases, that of \F\ < | or 
|F|>3. 

Note that our measurable function N(x) is defined on the set F. It is clear that our Density 
Lemma, Lemma 3.6, continues to hold in this context, with the change that the measure of 
F should be added to the right hand side of (3.7) . 



7.1. The case of \F\ < |. In this case, we will take F' = F. Recall that T denotes the 
set of all tiles. Clearly, size(T) < 1. We repeat the argument of (3.13) — (3.16) . Here, we 
should keep in mind that we want to balance out the estimate for the count (•) function, and 
that we have a better upper bound on the count function coming from the Density Lemma. 
Thus, T is a union of collections S n , for n > 0, so that 

(7.5) dense(5 n ) < 2~ 2n , 

(7.6) size(5 n ) < min(l,2-™|F|- 1 / 2 ), 

(7.7) count(S n )<2 2 "|F|. 
Then by the calculation of (3.16) , we have 

^2\(1e,<P.)(<P.,1f>)\ < dense(5 n )size(5 n )2 2 "|F| 
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< minflFUFl 1 ^-"). 
The sum of this terms over n > is no more than 

<l^|-|log|^ll 

which is as required. 

7.2. The case of \F\ > 3. This case corresponds to the analysis of the Carleson operator 
on L p for 1 < p < 2. We shall have need of a more delicate weak type inequality below to 
complete this proof. To define the set F', let 

n = {Ml E > C^F]- 1 }. 

By the weak L 1 inequality for the maximal function, for an absolute choice of C\, we have 
\Q\ < h\F\. And we take F' = F R tt c . The inner product in (7.3) is less than the sum of 

(7-8) ^|(/,^)(0 S ,1^)| 

seT 
hen 

(7-9) YM<P'X<I>;1f>)\ 



These sums are handled separately. 



For (7.8) , observe that tp s is essentially supported inside of Q while <p s is essentially not 
supported there. Thus, we should rely upon Schwartz tails to handle this term. Let J C Q 
be an interval such that 2 k JcQ but 2 k+1 J <£_ Q. We observe two inequalities for such an 
interval, which are stated using the function \Ji as defined in (3.4) . The first is that 



/ X.jdx< [ X jdx<2-^ k 

JF' J(2 k JY 



>(2 k jy 

Here, k is a large constant in the definition of x- Also, we have 



/ Xjdx <2 k Y^+ij dx 
Je Je 



<2 k inf Ml E (x) 
< 2 fc |F|" 1 . 

The last line follows as some point in 2 k+l J must be in Q. 

Observe that among all tiles s with I s = J, there is exactly one tile s with N(x) G uj s 
Hence 

J2\(f,<Ps)(cl>s,lF>)\ < \J\ y 52(lj S ,Xj)(lF>,Xj) 

s&T seT 
I a =J h=J 
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< 2-(«-2)fe|^i|-l| j| 

Recall that k is associated to how deeply J is embedded in fi, and that f2 has measure at 
most < Hence the right hand side above can be summed over J C Q to see that 



(7.8) < 1 

which is better than desired. 



We turn to the second estimate. Set T out := {seT: I s <f_ Q}, which is the collection of 
tiles summed over in (7.9) . The essential aspect of the definition of Q is this Lemma. 

Lemma 7.10. 

size(T out ) < IF]- 1 . 

Assuming the Lemma, we turn to the line of argument (3.13) — (3.16) . The collection 
T out can be decomposed into collections S n , for n > 0, for which (7.5) and (7.7) holds, and 
in addition 

size(5„)< mindFrMFr 1 /^-"'). 
Then by the calculation of (3.16) , we have 

^|<1 b ,^><^,1f'>| < minClJFl 1 ^-"), 

making the sum over n > no more than log|F|, as required. This completes the proof of 
the (7.2) . 



Proof. This is a consequence of the particular structure of a +tree T, and the fact for sGT, 
the distance of supp(^) to ujt is approximately |o; s |. The Calderon Zygmund theory applies, 
and shows that for any choice of signs e s G {±1}, for s e T, 

(7.11) \{£,e a (f,<p a )<p a > A}| < A^II/I/tIx/tIIi, A > 0. 

seT 

We apply this inequality for trees T G T out , and / = By taking the average over all 
choices of signs, we can conclude a distributional estimate on the square functions 

(7.12) A^KW^ 

Namely, that for each +tree T C T ou t, 

(7-13) |{A T >A}| < A-VtIIFI" 1 . 
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As this inequality applies to all subtrees of T, it can be strengthened. (This is a reflection 
of the John Nirenberg inequality.) Fix the +tree T C T ont . We wish to conclude that 

(7.14) [ A 2 T dx<\F\-\ 
For a subset T'cT, let 

sh(T') := |J I. 

sST' 

be the shadow of'T'. A shadow is not necessarily an interval. Define A T / as in (7.12) . And 
finally set 

(7.15) G(A) = sup IFIIshOTOI^KAT' > A}| 

T'CT 

Notice that (7.13) implies that G(X) < A -1 , for A > 0. If we show that G(X) < A~ 4 , for 
A > 1, we can conclude (7.14) . In fact we can show that G(X) decays at an exponential 
squared rate, which is the optimal estimate. 

Observe that (7.14) , implies that we have 

1(1^)1 < y ^ 

— —-7=== — < Ao < 00. 

Thus, the square functions we are considering A T can only take take incremental steps of a 
strictly bounded size. 

For any A > v^Ao, let us bound G(\/2A). Fix T' achieving the supremum in the definition 
of G(y/2X). Consider a somewhat smaller threshold, namely {A T / > A}. In order to proceed, 
consider a function r : sh T' 1— > M + such that 



\Is\>t(x) 

In addition require that r(x) is the smallest such function satisfying this condition. It is the 
case that the sum above can be no more than A 2 + Aq. 

Take T" C T' to be the tree 

T" := {s G T' : |/ s | < r(x), x G I,}. 
The point of these definitions is that 



A T '(x) > V2X implies A T n(x) > yj X 2 - Ag. 

Therefore, 

|F|- 1 sh(T')G(v / 2A) = |{A T / > V2X}\ 
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< |{A T „ > ^/a 2 - Ag>| 

< |F|- 1 sh(T")G(V / A^A§) 

< |F|- 1 sh(T , )G(A)C7(^/A 2 - A 2 ). 
We conclude that C7( V / 2A) < G( ^A 2 - A 2 ,) 2 . 

To conclude, we should in addition require that A is so large that C7(kA ) < |, where 

CO 

« := II Vl-2-*. 
fc=i 

An induction argument will then show that 

(7.16) C7(2 fc / 2 A ) < G(K\ f < 2" 2 \ k>0, 

which is the claimed exponential decay. Our proof of the Lemma is done. □ 
7.3. Complements. 

Remark 7.17. In the inequality (7.2) , one can show that the constants K p on the right hand 
side obey 

p — 1 

Remark 7.18. If it is the case that for some < a < 1, we have the inequality 

sup |sh(T')r 1/a ||A T '|U < oo 

T'CT 

then, the stronger estimate below holds. 

I|At||i< |sh(T)|. 

Remark 7.19. In (7.2) , we assert the restricted weak type inequality for 1 < p < oo. The 
weak type estimate for 2 < p < oo is in fact directly available. That is, for 2 < p < oo, and 
/ G L p of norm one, 

|{^/>A}|<A- p , A>0. 

The key point is to take advantage of the fact that / is locally square integrable. A very 
brief sketch of the argument follows. (1) It suffices to prove the inequality above for A = 1. 
(2) Define ft = {M|/| 2 > 1}, and show that |0| < 1. (3) Define sums as in (7.8) and (7.9) 
, and control each term separately. One will need to replace the Size Lemma as stated with 
5.15. 
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8. Remarks 



Remark 8.1. After L. Carleson [24] proved his theorem, R. Hunt [44] extended the argument 
to L p , for 1 < p < oo. A similar extension, in the Walsh Paley case, was done by Billiard [13], 
in the case of L 2 , and Sjolin [88], for all 1 < p < oo. The Carleson theorem has equivalent 
formulations on the groups E, T, and Z. The last case, of the integers, was explicitly discussed 
by Mate [70]. This paper was overlooked until recently. 

Remark 8.2. C. Fefferman [38] devised an alternate proof, which proved to be influential 
through it's use of methods of analysis that used both time and frequency information in 
an operator theoretic fashion. The proof of Lacey and Thiele [65] presented here borrows 
several features of that proof. The notion of tiles, and the partial order on tiles is due to 
C. Fefferman [38]. Likewise, the Density Lemma and the Tree Lemma, and the proof of the 
same, have clear antecedents in this paper. 

Remark 8.3. Those familiar with the Littlewood Paley theory know that it is very useful in 
decoupling the scales of operators, like those of the Hilbert transform. The Carleson operator 
is, however, not one in which scales can be decoupled. This is another source of the interest 
in this Theorem. 

Remark 8.4. We present the proof of the Carleson theorem on the real line due to the presence 
of the dilation structure. 

Remark 8.5. We choose to express the Carleson operator in terms of the projection P_. This 
operator is a linear combination of the identity operator and the Hilbert transform given by 



This form is suggestive of other questions related to the Carleson Theorem, a point we rely 
upon below. 

Remark 8.7. Despite the fact that Carleson's operator maps L 2 into itself, all three known 
proofs of Carleson's theorem establish the weak type bound on L 2 . The strong type bound 
must be deduced by interpolation. On the other hand, the weak type bound is a known 
consequence of the pointwise convergence of Fourier series. This was observed by A. Calderon, 
as indicated by a footnote in [107], and is a corollary of a general observation of E.M. Stein 



Remark 8.8. Hunt and Young [45] have established a weighted estimate for the Carleson 
operator. Namely for a weight w in the class A p , the Carleson operator maps L p (w) into itself, 
for 1 < p < oo. The method of proof utilizes the known Carleson bound, and distribution 
inequalities for the Hilbert transform. 



H f(x) := lim 




Hence, an alternate form of the Carleson operator is 



(8.6) 




[91]. 
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Remark 8.9. The Proposition 2.8 has a well known antecedent in a characterization of (a 
constant times) the Hilbert transform as the unique operator A such that A is bounded 
operator on L 2 that commutes with dilation, is invariant under dilations, A 2 is the identity, 
but is not itself the identity. See [92]. 

Remark 8.10. The inequality (3.3) eschews all additional cancellations. It shows that all the 
necessary cancellation properties are already encoded in the decomposition of the operator. 
In addition the combinatorial model of the Carleson operator is in fact unconditionally con- 
vergent in s G T. This turns out to be extremely useful fact in the course of the proof: one 
is free to group the tiles in anyway that one likes. 

Remark 8.11. The Size Lemma should be compared to Rubio de Francia's extension of the 
classical Littlewood Paley inequality [87]. Also see the author's recent survey of this theorem 



Remark 8.12. A +tree T is a familiar object. Aside from a modulation by c(u; T ), it shares 
most of the properties associated with sums of wavelets. In particular, if G u T , note that 



where the last norm is the BMO norm. 

Remark 8.13. The key instance of the Tree Lemma is that of a +tree. This case corresponds 
to a particular maximal function applied to a function associated to the tree. It is this point 
at which the supremum of Carleson's theorem is controlled by a much tamer supremum: The 
one in the ordinary maximal function. 

Remark 8.14. The statement and proof of the size lemma Lemma 3.9 replaces the initial 
arguments of this type that are in Lacey and Thiele [62]. This argument has proven to 
be very flexible in it's application. And, in some instances it produces sharp estimates, as 
explained by Barrioneuvo and Lacey [12]. 

Remark 8.15. The set of functions Tk := {s G T : \I S \ = 2 h } is an example of a Gabor basis. 
For appropriate choice of <p, the operator 



[58]. 





|/ s |=2 fc 

is in fact the identity operator. See the survey of I. Daubechies [35]. 



9. Complements and Extensions 



9.1. Equivalent Formulations of Carleson's Theorem. The Fourier transform has a 
formulation on each of the Euclidean groups R, Z and T. Carleson's original proof worked on 
T. Fefferman's proof translates very easily to R. Mate [70] extended Carleson's proof to Z. 
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Each of the statements of the theorem can be stated in terms of a maximal Fourier multiplier 
theorem, and we have stated it as such in this paper. Inequalities for such operators can 
be transferred between these three Euclidean groups, and was done so by P. Auscher and 
M.J. Carro [11]. 

Transference has also been studied in the bilinear setting. See articles by Fan and Sato, as 
well as Blasco and Villarroya [15,37]. 

9.2. Fourier Series Near L , Part 1. The point of issue here is the determination of that 
integrability class which guarantees the pointwise convergence of Fourier series. The natural 
setting for these questions is the unit circle T = [0, 1], and the partial Fourier sums 

S N f(x) = / We 2 ™*, f(n) = [ f{x)e~ mx dx 

In the positive direction, one seeks the "smallest" function ip such that if f T ip(f) dx < oo, 
then the Fourier series of / converge pointwise. 

N. Antonov [2] has found the best result to date, 

Theorem 9.1. For all functions f G L(logL)(logloglogL)(T) ; the partial Fourier series of 
f converges pointwise to f . 

This extends the result of P. Sjolin [88,89], who had the result above, but with a double 
log where there is a triple log above. Arias de Reyna [6, 7] has noted an extension of this 
Theorem, in that one can define a rearrangement invariant Banach space B, so that pointwise 
convergence holds for all f E B, and B contains L(logL)(logloglogL). 

The method of proof takes as it's starting point, the distributional estimate of (7.4) . One 
seeks to "extrapolate" these inequalities to the setting of the Theorem above and Antonov 
nicely exploits the explicit nature of the kernels involved in this maximal operator. Also see 
the work of P. Sjolin and F. Soria [90] who demonstrate that Antonov' s approach extends to 
other maximal operator questions. 

9.3. Fourier Series Near L 1 , Part 2. In the negative direction, A.N. Kolmogorov's fun- 
damental example [48,49] of an integrable function with pointwise divergent Fourier series 
admits a strengthening to the following statement, as obtained by Korner [52]. 

Theorem 9.2. For allijj(x) = o(loglogx), there is a function f : [0, 2tt] — > R with divergent 
Fourier series, and f\f\i>(f) dx < oo. 

The underlying method of proof was, in some essential way, unsurpassed until quite re- 
cently, when S. Konyagin [50,51] proved 
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Theorem 9.3. The previous Theorem holds assuming only 
(9.4) i/>(x) = o\ 



logx 



log log X 

There is a related question, on the growth of partial sums of Fourier series of integrable 
functions. G.H. Hardy [43] showed that for integrable functions /, one has S n f = o(logn) 
a.e., and asked if this is the best possible estimate. This question is still open, with the 
best result from below following from Konyagin's example. With ip as in (9.4) , there is an 
/ e L\T) with 

lim sup "77^- = oo for all x E T. 
Bochkarev [16] has a very slight strengthening of this result for Walsh series. 

Let St denote the Dirac point mass at t 6 T. The method of proof is to construct measures 

K 
k=l 

a set E C T with measure at least 1/4, and choices of integers N for which 

sup \S n fj,(x)\ > if)(N), x 6 E. 

n<N 

Kolmogorov's example consists of uniformly distributed point masses, whereas Konyagin's 
example consists of point masses that have a distribution reminiscent of a Cantor set. 

9.4. Probabilistic Series. It is of interest from the point of view of probability and ergodic 
theory, to consider the version of the Hilbert transform and Carleson theorem that arises 
from the integers. Here, we consider the probabilistic versions. Let be independent and 
identically distributed copies of a mean zero random variable X. The question is if the sum 

oo v 

E-^k 
~k 

k=l 

converges a.s. Without additional assumption on the distribution of X, a necessary and 
sufficient condition is that EXlog(2 + \X\) < oo. One direction of this is in [68]. If however 
X is assumed to be symmetric, integrability is necessary and sufficient. This addresses the 
issue of the Hilbert transform. 



Carleson's theorem, in this language, concerns the convergence of the series 

T 



CO y 

y W -Et^ forallteT. 



fc=i 
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The role of the quantifiers should be emphasized. Convergence holds for all t e T, on a set of 
full probability. Given this, abstract results on 0-1 Laws assure us that if the series converges 
for all t, off of a single set of probability zero, then the limiting function is continuous with 
probability one. The paper of Talagrand [100] gives necessary and sufficient conditions for 
the convergence of this series. 

Theorem 9.5. Let Xk be independent identically distributed copies of a mean zero symmetric 
random variable X . X G L(loglogL) iff the series Y(t) converges to a continuous function 
on T almost surely. 

The assumption of symmetry should be added to the statement of the Theorem in [100]. 
Cuziak and Li [34] provide an example of a non symmetric mean zero X e L(loglogL) for 
which the series Y(t) is divergent. This series is a borderline series in that it just falls out of 
the scope of the powerful theory of Marcus and Pisier [69] on random Fourier series. 

My thanks to several people who provided me with some references for this section. They 
are James Campbell, Ciprian Demeter, Michael Lin, and Anthony Quas. 



9.5. The Wiener— Wintner Question. A formulation of Carleson's operator on Z is 



See Mate [70]. Unaware of this work which followed soon after Carleson, J. Campbell and 
Petersen [21] considered this operator on £ 2 , with equivalence in £ p established by Assani and 
Petersen [9,21]. Also see Assani, Petersen and White [10], for these and other equivalences. 
The latter authors had additional motivations from dynamical sy 

stems, which we turn to now. 

A. Calderon [20] observed that inequalities for operators on Z which commute with transla- 
tion can be transferred to discrete dynamical systems. Let (X, jj) be a probability space, and 
T : X — > X a map which preserves \i measure. Thus, fi(T~ 1 A) = /jl(A) for all measurable 
A C X. A Carleson operator on (X, /i, T) is 




,irk 



,irk 



C mps f{x) := supsup ^ f(T k x) 



k 



And it is a consequence of Calderon's observation and Carleson's theorem that this operator 
is bounded on L 2 (X). 
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There is however a curious point that distinguishes this case from the other settings of 
Euclidean groups. It is the case that one has pointwise convergence of 

lim f(T k x)—— exists for all r 

0<\k\<N 

holding for almost every x G XI The boundedness of the maximal function C7 mps shows that 
this would hold on a closed set in L 2 (X). The missing ingredient is the dense class for which 
the convergence above holds. Unlike the setting of Euclidean groups, there is no natural 
dense class. 

This conjecture was posed by Campbell and Petersen [21]. 

Conjecture 9.6. For all measure preserving systems (X,fi,T), and all f G L 2 (X), we have 
the following: 

lim f(T h x)—j— exists for all t 

A theorem of N. Wiener and A. Wintner [106] provides a classical motivation of this ques- 
tion. This theorem concerns the same phenomena, but with the discrete Hilbert transform 
replaced by the averages. 

Theorem 9.7. For all measure preserving systems (X,fi,T), and all f G L 2 (X), we have 
the following: 



/i<{ lim N- 1 ^ f(T k x)e iTk exists for all r [ = 1. 



k=0 

This theorem admits a simple proof. And note that this theorem trivially supplies a dense 
class in all LP spaces, 1 < p < oo. 

The Wiener-Wintner theorem has several interesting variants, for which one can phrase 
related questions by replacing averages by Hilbert transforms. As far as is known to us, none 
of these questions is answered. 

An attractive theorem proved by E. Lesigne [66, 67] is 

Theorem 9.8. For any measure preserving system (X,fi,T) and all integrable functions f , 
there is a subset Xf C X of full measure so that for all x G Xf, all polynomials p, and all 
1-periodic functions <fi, the limit below exists: 



N 



lim N- 1 V0(pH)/(T"x) 



n=l 
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Extending this theorem to the Hilbert transform would be an extraordinary accomplish- 
ment, whereas if one replaced the discrete dynamical system by flows, it could be that the 
corresponding result for the Hilbert transform might be within reach. 

In connection to this, Arkihpov and Oskolkov [8] have proved the following theorem. 
Theorem 9.9. For all integers d, 



sup 



n 



< oo. 



deg(p)=d' 

with the supremum formed over all polynomials of degree d. 



This is a far more subtle fact than the continuous analog stated in (9.10) . Arkihpov and 
Oskolkov use the Hardy Littlewood Circle method of exponential sums, with the refinements 
of Vinogradov. See [8,82,83]. 

By Plancherel, this theorem shows that for a polynomial p which maps the integers to the 
integers, the operators on the integers given by 

r P /0') = £/(*-*(*)) 

is a bounded operator on £ 2 (Z). 

E.M. Stein and S. Wainger have established £ 2 mapping properties for certain Radon 
transforms [97,98]. 

9.6. E.M. Stein's Maximal Function. A prominent theme of the research of E.M. Stein 
and S. Wainger concerns oscillatory integrals, with polynomial phases. It turns out to be of 
interest to determine what characteristics of the polynomial govern allied analytic quantities. 
In many instances, this characteristic is just the the degree of the polynomial. For instance, 
the following is a corollary to a Theorem of Stein and Wainger from 1970 [96]. Namely, that 
one has a bound 

Mv) dy 



(9.10) sup 

deg(P)=d 



y 



<1, (2=1,2, 



A conjecture of Stein's concerns an extension of Carleson's maximal operator to one in 
which one forms a supremum over all polynomial choices of phase with a fixed degree. Thus, 

Conjecture 9.11. For each integer d, the maximal function below maps LP into itself for 
1 < p < oo. 

C d f{x)= sup fe^fix-y)^- 

deg(P)=d J V 
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Note that the case of d = 1 corresponds to Carleson's theorem. Let us set C' d to be the 
maximal operator above, but with the the restriction that the polynomials p do not have a 
linear term. It is useful to make this distinction, as it is the linear terms that are intertwined 
with the Fourier transform. 

E.M. Stein [94] considered the purely quadratic terms, and showed that C' 2 maps L p into 
itself for all 1 < p < oo. The essence of the matter is the bound on L 2 , and there his argument 
is a variant on the method of TT*, emphasizing a frequency decomposition of the operator. 
Stein and Wainger [99] have proved that C' d is bounded on all L p 's, for all d > 2. Again the 
L 2 case is decisive and the argument is an application of the T T* method, but with a spatial 
decomposition of the operator. 

Let us comment in a little more detail about how these results are proved. If, for the 
moment, one consider a fixed polynomial P(y), and the oscillatory integral 

dy 

y ' 

One may utilize the scale invariance of the the Hilbert transform kernel to change variables. 



(9.12) T P f(x):= J e iP Wf(x-y)- 



With the correct change of variables, one may assume that the polynomial P(y) = ~52j=i a jy' 
satisfies 5^|oj| = 1. Then, it is evident that for \y\ < 1, say, that the integral above is 
well approximated by a truncation of the Hilbert transform. Thus, it is those scales of the 
operator larger than 1 that must be controlled. 

It is a consequence of the van der Corput estimates that some additional decay can be 
obtained from these terms. In particular, one has this estimate. To set notation, in the one 
dimensional case only, set 

P s (x) = a d x d H haix, a = (a d , . . . ,a 1 ). 

Lemma 9.13. Let x be a smooth bump function. Then we have the estimate 

e"^x(y)(0 < (1 + M| 1 )- 1/d - 

oo 

In particular, by the Plancherel identity, we have the estimate 
(9-14) ||[e jPsfe) x(z/)]*/(^)|| 2 <(l + l|a||i)- 1/d 



Notice that these estimates are better than the trivial ones. And that the second estimate 
can be interpolated to obtain a range of LP inequalities for which one has decay, with a rate 
that depends upon the degree and the LP space in question. 

In a discussion of the extensions of this principle in for example [94, 99] , one establishes 
appropriate extensions of this last lemma, always seeking some additional decay that arises 
from the polynomial. For instance, in [99], Stein and Wainger prove a far reaching extension 
of this principle. 
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Lemma 9.15. There is a constant 5 > 0, depending only on the degree d, so that we have 
the estimate \\S\W2 < (1 + A)~ 5 ; for all X > 0, where 



It is essential in this supremum be formed over polynomials P3 which do not have a linear 



A. Ionescu has pointed out that this Lemma is not true with the linear term included, even 
in the case of second degree polynomials. The example, which we will see again below, begins 
by taking a function f(x), and replacing it by the function g(x) = e lXx f(x). Then, in the 
supremum defining S\ above, take the dilation parameter to be t = 1, and the polynomial to 
be P(y) = —y 2 + 2xy. Note that as we are taking a supremum, we can in particular take a 
polynomial that depends upon x. 

In this example, the modulation of / by "chirp" is then canceled out by the choice of P. 
There is no decay in the estimate. This estimate is special to the case of the second power, 
so it is natural to guess that it plays a distinguished role in these considerations. 

This also points out an error in the author's paper [57]. (The error enters in specifically 
at the equation (2.9). The phase plane analysis of that paper might yet find some use.) At 
this point, the resolution of Stein's conjecture is not settled. And it appears that a positive 
bound of the operator C 2 will in particular require a novel phase plane analysis with quadratic 
phase. This should be compared to the notion of degeneracy in Section 9.11 below. 

9.7. Fourier Series in Two Dimensions. In this section we extend the Fourier transform 
to functions of the plane 



where £ = (£1,^2), and x = (xi,x 2 ). The possible extensions of Carleson's theorem to the 
two dimensional setting are numerous. The state of our knowledge is not so great. 

9.7.1. Fourier Series in Two Dimensions, Part I. Consider the pointwise convergence of the 
Fourier sums in the plane given by 



Jtp 

Here, P is polygon with finitely many sides in the plane, with the origin in it's interior. 
Proving the pointwise convergence of these averages is controlled by the maximal function 



S A /(*):= sup sup|[DiLy P3 « X ]*/| 



|5||i>A t>0 
ai=0 



term. 
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C. Fefferman [40] has observed that this maximal operator can be controlled by a sum of 
operators which are equivalent to the Carleson operator. 

For simplicity, we just assume that the polygon is the unit square. And let 



This is the Fourier projection of / onto the sector swept out by the right hand side of the 
square. Notice that Cp o Q is the one dimensional Carleson operator applied in the first 
coordinate. 

Thus, Cp is dominated by a sum of terms which are obtained from the one dimensional 
Carleson operator, and so Cp maps LP into itself for 1 < p < oo. This argument works for any 
polygon with a finite number of sides. While we have stressed the two dimensional aspect of 
this argument, it also works in any dimension. 

Nevertheless, it is of some interest to consider maximal operators of the form 



where K is a Calderon Zygmund kernel. This is the question addressed by P. Sjolin [40], and 
more recently by Sjolin and Prestini [85] and Grafakos, Tao and Terwilleger [42]. 

9.8. Fourier Series in Two Dimensions, Part II. What other methods can be used to 
sum Fourier series in the plane? One method that comes to mind is over arbitrary rectangles. 
That is, one considers the maximal operator 



The supremum is formed over arbitrary rectangles lo with center at the origin. C. Fefferman 
[39] has shown however that this is a badly behaved operator. 

Proposition 9.16. There is a bounded, compactly supported function f for which IZf = oo 
on a set of positive measure. 

This maximal operator has an alternate formulation, see (8.6) , as 



The example of C. Fefferman is a sum of terms f\(x,y) = e v x(x,y), where A > 3, and x 
is a smooth bump function satisfying e.g. l[-i,i]a < X < l[-2,2]- The key observation in the 
construction of the example is 



Qf 






(9.17) 
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Lemma 9.18. We have the estimate 

Kf x (x,y) > log A 

2 

2' 2J 



for (x,y) E [-§, \f. 



Proof. In the supremum over a and (3 in the definition of 71, let a = Xy and (3 = \x, and 
consider 

dy' 



R(x,y) 



/ A (x-x',y-y')e a(2W) — 



, dx'i dy 1 



e lXxy X {x-x\y-y') — 



x J y 



The inside integral in the brackets admits these two estimates for all (x, y) G 



I 112 
2' 21 ■ 



e^xix-x'^y-y')^ 



dx' 



x 



csign(A 2 /') + 0(A| Z /'|)- 1 
0(1 + X\y'\) 

c is a non-zero constant. Both of these estimates are well-known. 
Then, we should estimate 

R{x,y) = I I(x,y,y r )^-+ I I(x,y,y')^-. 

J\y-y'\<l/X V J\y-y'\>l/\ V 

The first term on the right is no more than 0(1), and the second term is > log 1/A. 



□ 



This example shows that there are bounded functions / for which 

f{x -x',y- y'y (ax ' +(3y,) log A . 

x' y' 

It might be of interest to know if this estimate is best possible. 



sup 

\a\,\/3\<N 



The integrals in (9.17) are singular integrals in the product setting. There is as of yet no 
positive results relating to Carleson's theorem in a product setting. 



9.8.1. Fourier Series in Two Dimensions, Part III. The exponential e^' x is an eigenfunction 
of —A, the positive Laplacian, with eigenvalue |£| 2 . It would be appropriate to sum Fourier 
series according to this quantity. This concerns the operator of Fourier restriction to the unit 
disc 

Tf : / f(0< <K 
■/|£|<1 
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It is a famous result of C. Fefferman [41] that T is bounded on L 2 (M. 2 ) iff p — 2. The 
fundamental reason for this is the existence of the Besicovitch set, a set contained in a large 
ball, that contains a unit line segment in each dir ection, but has Lebesgue measure one. The 
relevance of this set is indicated by the observation that the restriction of T to very small disc 
placed on the boundary of the disc is well approximated by a projection onto a half space. 
Such a projection is a one dimensional Fourier projection performed in the normal direction 
to the disc. And the normal directions can point in arbitrary directions. An extension of 
Fefferman's argument shows that the Fourier restriction to any smooth set with a curved 
boundary can only be a bounded operator on L 2 . 

Nevertheless, the question of summability in the plane remains open. Namely, 
Conjecture 9.19. Is it the case that the maximal operator below maps L 2 (IR 2 ) into weak 



In the known proofs of Carleson's theorem, the truncations of singular integrals plays a 
distinguished role. In the proof we have presented, this is seen in the Tree Lemma, also 
cf. 8.13. In view of this, it interesting to suppose that if this conjecture is true, what could 
play a role similar to the Tree Lemma. It appears to be this. 

Conjecture 9.20. Is it the case that the maximal operator below maps L 2 (IR 2 ) into weak 



It is conceivable that a positive answer here could lead to a proof of spherical convergence 
of Fourier series. A recent relevant paper is [22] by Carbery, Georges, Marietta and Thiele. 

9.8.2. Fourier Series in Two Dimensions, Part IV. In order to bridge the gaps between Parts 
I and III, the following question comes to mind. Is there a polygon with infinitely many sides 
which one could sum Fourier series with respect to? 

G. Mockenhoupt pointed out to the author that there is a natural first choice for P. It is 
a polygon Pi ac which in the first quadrant has vertices at the points e m2 for k G N. Call 
this the lacunary sided pol ygon. 

It is a fact due to Cordoba and R. Fefferman [33] that the lacunary sided polygon is a 
bounded LP multiplier, for all 1 < p < oo. That is the operator below maps LP into itself for 
1 < p < oo. 



L 2 (R 2 )? 




L 2 {R 2 )? 
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This fact is in turn linked to the boundedness of the maximal function in a lacunary set of 
directions: 

M lac /(x) =supsup(2t)- 1 / \f(x - u(l,2- k ))\ du 
fceN t>o J-t 

Note that this is a one dimensional maximal function computed in a set of directions in the 
plane that, in a strong sense, is zero dimensional. Let us state as a conjecture. 

Conjecture 9.21. For 2 < p < oo, the maximal function below maps L P (R 2 ) into itself. 

f[0< :t -' dt, 



sup 

t>o JtP x 

Even a restricted version of this conjecture remains quite challenging. In analogy to Ques- 
tion 9.20 

Conjecture 9.22. For 2 < p < oo, the maximal function below maps L P (R 2 ) into itself. 



sup 

fcgN 



(l +2 -fe)P lac 



Another question, with a somewhat more quantitative focus, considers uniform polygons 
with iV sides, but then seek norm bounds on these two maximal operators, on L? say, which 
grow logarithmically in N. We do not have a good conjecture as to the correct order of 
growth of these constants. If one could prove that the bounds where independent of N, then 
the spherical summation conjecture would be a consequence. 



9.8.3. Fourier Series in Two Dimensions, Part V. Again in the plane, consider the Fourier 
restriction to the semi infinite rectangles 



/n i"n? 
-oo J —oo 



Thus, we are restricting to a semi infinite rectangle with vertex on a parabola. If we consider 
the maximal operator 

Sf:= sup|S„/|. 

n>0 

The question then concerns the LP boundedness of this operator on LP spaces. 

To this end, we remark that a very nice argument of C. Fefferman [40] shows that the 
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9.9. The Bilinear Hilbert Transforms. The bilinear Hilbert transforms are given by 

/dy 
f(x-ay)g(x-y) —, ael, 
y 

with the convention that H (/, g) = fHg, and H^f, g) = (H/)g. A third degenerate value 
is a = 1. 

These transforms commute with appropriate joint translations of / and g, and dilations of / 
and g. They are related to Carleson's theorem through the observation that for a ^ {0, 1, oo}, 
H a enjoys an invariance property with respect to modulation. Namely, 

H^Mod^Mod-o^) = Mod M H Q (/,(7). 

That is, the bilinear Hilbert transforms share the essential characteristics of the Carleson 
operator. 

It was the study of these transforms that lead Lacey and Thiele to the proof of Carleson's 
theorem presented here. The bilinear Hilbert transforms are themselves interesting objects, 
with surprising properties. Indeed, it is natural to ask what LP mapping properties are 
enjoyed by these transforms. Note that in the integral, the term dy/y is dimensionless, so 
that the LP mapping properties should be those of Holder's inequality. Thus, H Q should 
map L 2 x L 2 into L 1 . Note that this is false in the degenerate case of a = 1, as the Hilbert 
transform does not preserve L l . Nevertheless, this was conjectured by A. Calderon in the 
non-degenerate cases. 

See [61-64] for a proof of this theorem. 
Theorem 9.23. For 1 < p, q < oo, if0<l/r = l/p+ 1/q < 3/2, and a £ {0, 1, oo} ; then 

mM,g)\\r<\\f\\ P \\g\\ q 

We should mention that in a certain sense the proof of this theorem is easier than that for 
Carleson's operator. The proof outlined in [56] contains the notions of tiles, trees, and size. 
But the estimate that corresponds to the tree lemma is a triviality. The reason for this gain 
in simplicity is that there is no need for a mechanism to control a supremum. 

The subject of multilinear operators with modulation invariance has inspired a large num- 
ber of results, and is worthy of survey on it alone. We refer the reader to C. Thiele's article 
[105] for a survey of recent activity in this area. 

9.10. The Bilinear Maximal Functions. The theory of the one dimensional Hilbert trans- 
form and maximal function are intimately related, hence it is natural to consider the bilinear 
maximal functions 

M a (f,g) := sup(2t) _1 / \f(x-ay)g(x-y)\ dy . 

t>0 J-t 
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For this operator, certain bounds are immediately availible. Namely, for conjugate indicies 

- -V = 1, we have 
p=p 

M a (f,g)<mf\n 1/p mg\ p, ) i/p ' ■ 

Thus, the known LP inequalities for the maximal function are availible, showing that e.g. the 
bilinear maximal function maps L 2 x L 2 into weak L 1 . 

But, the bilinear Hilbert transform maps into certain spaces U for | < r < 1, so it is 
natural to ask if the bilinear maximal function does as well. Indeed, this is true. 

Theorem 9.24. For 1 < p, q < 2, ifl<l/r— l/p + l/q < 3/2, and a $ {0, 1, oo}, then 

\\M a (f,9)\\r< \\f\\ P \\g\\ 9 

The proof of the author [53] begins by observing that the maximal function can be be 
bounded by truncations of a singular integral operator, with approriately chosen kernel. 
Essentially, it is enough to bound the maximal truncations of the bilinear Hilbert transform 
given by 

sup f f(x-ay)g(x-y) — 

fcez J\y\<2 k V 

These maximal operators obey the same inequalities in Theorem 9.23, and this is the main 
theorem in 

The central point is to replace the Size Lemma above by a suitable maximal variant. This 
is can be done, but one must appeal to fundamental maximal inequality found by Bourgain 
[18]. The reader can also consult the paper of Thiele [104] which presents the entire proof in 
the Walsh context, where many of the technical difficulties are minimized. 

Demeter, Tao and Thiele have revisited these issues. Strikingly, they found an argument 
which provides an e improvement over the trivial L 2 x L 2 i— > weakL 1 bound mentioned above. 
Moreover, the argument uses arithmetic combinatorics. See [36]. 

9.11. Multilinear Oscillatory Integrals. Consider the bilinear oscillatory Integral 



f e y 
B 2 (f 1 J 2 )(x) := / fi(x-y)f 2 (x + y) dy. 

J y 



This is seen to be a disguised form of a bilinear Hilbert transform. Setting gj(x) := e lx fj(x), 
one sees that 

B 2 ( 9l ,g 2 )(x) :=e 2 - 2 / f l ( x -y)f 2 ( x + y ) *Z. 

J y 

(Compare this to Ionescu's example mentioned at the end of Section 9.6.) As it turns out, 
for a polynomial of any other degree, the integral above is bounded. The proof demonstrates 
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a multilinear variant of the van der Corput type inequality, of which Lemma 9.13 is just one 
example. 

More generally, M. Christ, Li, Tao and Thiele [30] define multilinear functionals 

„ n 

(9.25) Aa(/i,/ 2 ,--- Jn)= / e^Hfjiwjix^Wdx 

Jm.™ j=1 

where A G R is a parameter, P : R m i— ► R is a real- valued polynomial, m > 2, and rj G Cq (R m ) 
is compactly supported. Each ttj denotes the orthogonal projection from R m to a linear 
subspace Vj C R' m of any dimension k < m — 1, and fj : — > C is always assumed to be 
locally integrable with respect to Lebesgue measure on Vj. 

Notice that by taking n — 3, and taking projections ttj : R 2 i— > Vj where 
Vx = {(x,x) : x G R}, 1/ 2 = {(x, -x) : x G R}, 
(9 ' 26) ^3 = {(x,0) : x G R}, 

we can recover for instance a bilinear Hilbert transform. 

And they say that a polynomial P has a power decay property if there is a 5 > 0, so that 
for all fj G L°°(Vj), we have the estimate 

n 

|A(/l,-..,/n)|<(l + |A|)-*nil/J°o 

3=1 

From this estimate, a range of power decay estimates hold in all relevant products of LP 
spaces. This should be compared to Lemma 9.13 and in particular (9.14) below. 

Clearly, there are obstructions to a power decay property, and this obstruction can be 
formalized in a definition. A polynomial P is said to be degenerate (relative to {Vj} ) if there 
exist polynomials pj : Vj — > R such that P = Yl^iPj n j- Otherwise P is nondegenerate. 
In the case n = 0, where the collection of subspaces {Vj} is empty, P is considered to be 
nondegenerate if and only if it is nonconstant. And in the example (9.26) , we see that 
P(y) = 2x 2 + 2y 2 — (x + y) 2 + (x — y) 2 is degenerate. 

It is natural to conjecture that non degeneracy is sufficient for a power decay property. 
This is verified in a wide range of special cases in the paper Christ, Li, Tao, and Thiele [30], 
by a range of interesting techniques. It is of interest to determine if the natural conjecture 
here is indeed correct. 



9.12. Hilbert Transform on Smooth Families of Lines. This quetion has its beginnings 
in the Besicovitch set, which we already mentioned in connection to spherical summation of 
Fourier series. One may construct Besicovitch sets with these properties. For choices of 
< e, a < 1, there is a Besicovitch set K in the square [0, 4] 2 say, for which K has measure 
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at most e, and there is a function g : M. 2 — > T, so that for a set of x's in [0, 4] 2 of measure > 1, 
K C\ {x + tv(x) : £ G M} contains a line segment of length one, and v is Holder continuous of 
order a. 

One can ask if the Holder continuity condition is sharp. A beautiful formulation of a 
conjecture in this direction is attributed to A. Zygmund. 

Conjecture 9.27. Let v : R 2 — ► T be Holder continuous (of order 1). Then for all square 
integrable functions f , 



This is a differentiablity question, on a choice of lines specified by v. The only stipulation is 
that v is Holder continuous. This is only known under more stringent conditions on v, such as 
analytic due to E.M. Stein [93], or real analytic due to Bourgain [19]. There is a partial result 
due to N. Katz [47] (also see [46]) that demonstrates at worst "log log" blowup assuming the 
Holder continuity of v. The question is open, even if one assumes that v G C 1000 . 

The difficulty in this problem arises from those points at which the gradient of v is degen- 
erate; assumptions such as analyticity certainly control such degeneracies. 

E.M. Stein [93] posed the Hilbert transform variant, namely defining 



is it the case that there is a constant c so that if ||u||h6i < c i then H v maps L 2 (M 2 ) into 
itself. A curious fact about this question is that this inequality, if known, implies Carleson's 
theorem for one dimensional Fourier series. 

To see this, observe that the symbol for the transform is ■?/>(£ ■ v (x)), where tp is the Fourier 
transform of y^lnyi^y. Suppose the vector field is of the form v(x) = (l,v(xi)) where we 
need only assume that v is Holder continuous of norm 1 say, and consider the trace of the 
symbol on the line £ 2 = — N. Then, the symbol is ip((£i,N) ■ (l,v(xx)) = — Nu(xi)). 
We conclude that thi s symbol defines a bounded linear operator on L 2 (IR), with bound that 
is independent of N. That is, for any Lipschitz function v(xi), and any N > 1 the symbol 
^>(£i — Nu(xi)) is the symbol of a bounded linear operator on L 2 (M) . By varying iV and 
v, we may replace Nv{x\) by an arbitrary measurable function. This is the substance of 
Carleson's theorem. 

But the implication is entirely one way: A positive answer to the family of lines question 
seems to require techniques quite a bit more sophisticated than those that imply Carleson's 
theorem. Recently Lacey and Li [59, 60] have been able to obtain a partial answer, assuming 
only that the vector field has 1 + e derivatives. 





dy 



y 



CARLESON'S THEOREM:PROOF, COMPLEMENTS, VARIATIONS 



47 



Theorem 9.28. Assume that v G C 1+e for some e > 0. Then the operator H v is bounded on 
L 2 (IR 2 ). The norm of the operator is at most 



9.13. Schrodinger Operators, Scattering Transform. There is a beautiful line of inves- 
tigation relating Schrodinger equations in one dimension to aspects of the Fourier transform, 
and in particular, Carleson's theorem. There is a further connection to scattering transforms 
and nonlinear Fourier analysis. All in all, these topics are extremely broad, with several 
different sets of motivations, and a long list of contributors. 

We concentrate on a succinct way to see the connection to Carleson's theorem, an obser- 
vation made explicitly by M. Christ and A. Kiselev [25,26], also see C. Remling [86]. The 
basic object is a time independent Schrodinger operator on the real line, 



where V is an appropriate potential on the real line. The idea is that if V is small, in 
some specific senses, then the spectrum of H should resemble that of I n particular 

eigenfunctions should be perturbations of the exponentials. 

Standard examples show that one should seek to show that for almost all A, the eigenfunc- 
tions of energy A, that is the solutions to 



H v \\ 2 < [l + log + |M| c i +£ ] 2 . 



(H - A 2 /) 



are bounded perturbations of e ±J 



Seeking such an eigenf unction, one can formally write 

/ X i \ „ -L / . / X / X 




Iterating this formula, again formally, one has 

u{x) - 



(9.29) 




(9.30) 




j/i ) sin(yi - y2)V(yi)V(y 2 )u(y 2 ) dy! dy 2 . 



Observe that (9.29) no longer contains u, and is a linear combination of 



(9.31) 
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(9.32) e iXx / V(y) dy 

One seeks estimates of these in the mixed norm space of say, L\L°^. From such estimates, 
one deduces that for almost all A, there is an eigenfunction with is a perturbation of e lXx . 

Concerning (9.31) , notice that if V G L 2 , we can, by Plancherel, regard V as /, for some 
/ G L 2 . The desired estimate is then a consequence of Carleson's theorem. This is indicative 
of the distinguished role that L 2 plays in this subject. Also of the intertwining of the roles 
of frequency and time that occur in the subject. 

Concerning (9.32) , unless VeL 1 , there is no reasonable interpretation that can be placed 
on this term. In practice, a different approach than the one given here must be adopted. 

If one continues the expansion in (9.30) , one gets a bilinear operator with features that 
resemble both the Carleson operator, and the bilinear Hilbert transform. See the papers by 
C. Muscalu, T. Tao, and C. Thiele [74,75,79,81]. 

We refer the reader to these papers by M. Christ and A. Kiselev [25-28]. For a survey of 
this subject, see M. Christ and A. Kiselev [29]. The reader should also consult the ongoing 
investigations of C. Muscalu, T. Tao, and C. Thiele [80]. This paper begins with an interesting 
summary of the perspective of the nonlinear Fourier transform. 
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